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SCAND3 and KRBA2 are two mammalian proteins originally described as "cellular-integrases" due to sharing of a similar 
DDE-type integrase domain whose origin and relationship with other recombinases remain unclear. Here we perform 
phylogenetic analyses of 341 integrase/transposase sequences to reveal that the integrase domain of SCAND3 and 
KRBA2 derives from the same clade of GINGER2, a superfamily of cut-and-paste transposons widely distributed in insects 
and other protostomes, but seemingly absent or extinct in vertebrates. Finally, we integrate the results of phylogenetic 
analyses to the taxonomic distribution of SCAND3 and KRBA2 and their transposon relatives to discuss some of the 
processes that promoted the emergence of these two chimeric genes during mammalian evolution. 



Results 

Mobile genetic elements (MGEs) are 
abundant selfish components of living 
organisms that can sometimes provide 
new substrate for the enrichment of host 
genome complexity by co-option of their 
coding or non-coding components (see 
refs. 1 and 2). The process of co-option, 
also known as exaptation or domesti- 
cation, is not always straightforward 
to investigate as it frequently involves 
events of recombination, gene fusion 
and/or exonization (the creation of a new 
exon as a result of mutations in intronic 
sequences), 3 ' 5 making particularly diffi- 
cult to annotate and interpret these types 
of gene structures in sequencing projects. 
Along these lines, in 2005, two groups of 
DDE-like cellular integrases (C-INTs) 
were described in diverse eukaryotes. 6 Due 
to the similarity of these sequences with 



the INTs of LTR retroelements, initially 
it was hypothesized that they evolved 
from the latter to serve cellular functions. 
Further studies revealed that one of these 
two C-INT groups defines a new type of 
transposases (TPases) encoded by a previ- 
ously unrecognized subclass of eukaryotic 
MGEs called Mavericks 7 ^ or Polintons? 
As for the other C-INT group, genomic 
annotations subsequently disclosed two 
single copy gene variants (restricted to 
humans and eutherian mammals) for- 
mally called SCAND3 and KRBA2 in 
GenBank. 10 The predicted products of 
these two genes share an INT core but can 
be distinguished from one another by sev- 
eral features. In the predicted SCAND3 
protein (also known as ZNF452), the 
INT core is flanked at the N-terminus 
by a SCAN zinc-finger domain," and, 
at the C-terminus, by a TPase-derived 
hATd dimerization module similar to 



that of the hobo-Activator-Tam3 (hAT) 
superfamily 12,13 In the KRBA2 protein, 
a distinct zinc-finger Krtippel-associated 
box (KRAB) domain 14 is predicted to pre- 
cede the INT domain, while hATd and 
SCAN domains are not detected. The 
availability of multiple ESTs for the two 
SCAND3 and KRBA2 gene variants, and 
strong selective constraints acting on their 
coding sequence, 6 suggests that they are 
functional genes with an unknown cel- 
lular role. The two N-terminal zinc-finger 
domains (i.e., SCAN and KRAB) of their 
encoded products are typically associated 
with transcriptional regulation of gene 
expression. It is known that the KRAB 
domain acts as a platform to recruit tran- 
scriptional repressor complexes (including 
histone deacetylases) involved in main- 
tenance of the nucleolus, cell differen- 
tiation, cell proliferation, apoptosis and 
neoplastic transformation. 14 " 16 In addition, 



Correspondence to: Carlos Llorens; Email: carlos.llorens@uv.es 
Submitted: 10/11/2012; Revised: 11/15/2012; Accepted: 11/15/2012 
http://dx.doi.org/10.4161/mge.22914 



www.landesbioscience.com 



Mobile Genetic Elements 



205 



recent studies have identified KRAB 
enzymes acting in transcriptional repres- 
sion of exogenous and endogenous retrovi- 
ruses. 17,18 No precise cellular function has 
been directly assigned to SCAND3 but 
its SCAN N-terminus domain is known 
to play a role in transcriptional regulation 
of genes involved in metabolism, cell sur- 
vival and differentiation." 19 The SCAN 
domain often co-exists with KRAB in 
diverse transcription factors, 19 and it is 
thought to be evolutionarily derived from 
the gag-like proteins encoded by LTR 
retroelements. 20,21 In this article we ana- 
lyze the transposon-derived origins of the 
C-INT group constituted by SCAND3 
and KRBA2, which based on their respec- 
tive N-terminal domains were formally 
classified as putative transcription factors 
along with other mammalian proteins (for 
more detailed review, see ref. 19). Taking 
this into primary consideration, the term 
SCAN/KRAB C-INTs will be thus used 
throughout this article when collectively 
referring to SCAND3 and KRBA2. The 
largest component of SCAN/KRAB 
C-INTs is their common INT domain 
thought to be related to LTR retroele- 
ments INTs and/or to Maverick IPolinton 
TPases. 6,7 However, in a previous effort to 
annotate LTR retrotransposons in the pea 
aphid genome, 22,23 we found high simi- 
larity (e-value > le-60 in Blast searches) 
between SCAN/KRAB C-INTs and a 
group of poorly characterized TPases dis- 
tantly related to LTR retroelement INTs 
and the Maverick/Polinton TPases. This 
preliminary observation prompted us to 
investigate in more details the evolution- 
ary history of SCAN/KRAB C-INTs and 
clarify their relationship to other INTs 
and TPases (IN/TPases). 

Phylogenetic analysis based on all 
known to date IN/TPase groups related 
to the INTPase domain of SCAND3 
and KRBA2 C-INTs (Supplemental 
Materials 1 and 2) using the Maximum 
likelihood (ML) method 24 resulted in a 
tree, provided as Supplemental Material 
3. Overall the phylogenetic relationship of 
the major INT/TPase groups was consis- 
tent with what was previously known (see 
refs. 25 and 26 or visit the GyDB Project 27 
for more detailed information). Moreover, 
the phylogeny places with high statistical 
support the INT/TPase cores of SCAND3 



and KRBA2 as two sibling clades within 
the major group of GINGER2 TPases. 
Foblp (a fungal host protein) and Tdd- 
like elements (amoebozoan transposons) 
also fall within the GINGER2 branch but 
both appear to be more distantly related 
to GINGER2 TPases. Figure 1A provides 
further support to this scenario through 
a ML phylogenetic tree based only on 
the GINGER2 branch and their rela- 
tives including SCAND3 and KRBA2. 
In this analysis, the latter form two sister 
clades clustering with diverse GINGER2 
IN/TPases from hemipteran insects such 
as A. pisum and Rhodnius prolixus, and 
from colepteran and lepidopteran insects 
such as Tribolium castaneum and Bombyx 
mori. Consistent with these results, com- 
parative analyses based on tBLASTn 
searches 28 against the whole-genome 
shotgun (WGS) (and other) databases of 
NCBI revealed strong levels of similarity 
between SCAND3 and KRBA2 and the 
full-length translated protein of insect 
GINGER2 transposons (e-values as sig- 
nificant as 10e-90, see also alignment in 
Fig. 2B). In summary, and contrary to 
previous hypotheses, 6,7 the IN/TPase core 
domain of SCAN/KRAB genes does not 
appear to be derived from either Maverick 
or a LTR elements but from the domesti- 
cation of a full-length GINGER2 trans- 
posase probably recruited by horizontally 
transfer from insect to mammal (see 
below) . 

SCAND3 is not only derived from 
a GINGER2 transposon but from 
components of two additional MGEs. 
The first MGE-derived component 
is the N-terminal SCAN domain, 
which recently has been demonstrated 
to have evolved from the gag gene of 
LTR retroelements. 20,21 In addition, the 
C-terminus of SCAND3 is derived from 
a distinct transposase (CharlielO) of the 
hAT superfamily most closely related to 
the Spin/Buster subgroup. 25 In an attempt 
thus to discuss the timing and mechanisms 
of assembly of SCAND3 and KRBA2 in 
mammals, we integrated the results of our 
phylogenetic analysis and the distribution 
of these two genes onto a simplified 
tree of life (Fig. 2). According to this 
reconstruction, the presence of SCAND3 
and KRBA2 in almost all eutherian 
genomes (including the basal mammalian 



subgroups, afrotherians and xenarthans) 
and the detection of SCAND3 in at least 
the wallaby, suggest that SCAN/KRAB 
C-INTs (or at least SCAND3) might 
have originated prior to the eutherian- 
metatherian split (145—65 million years 
ago 30,31 ). Thus, SCAND3 and KRBA2 are 
restricted to therian mammals (eutherians 
and metatherians) and their respective 
phylogenies follow that expected for single 
copy genes vertically inherited from a 
therian ancestor. We found no trace of 
either genes in Mus musculus and Rattus 
norvegicus which suggests that they were 
both lost in the lineage of murine rodents. 
Additional screenings performed on 
Ensembl databases suggest that while 
Cavia porcellus (guinea pig) apparently 
lack both genes, Ictidomys tridecemlineatus 
(the squirrel) has preserved both genes 
(according to the assembly spetri2 
accessions JH393724.1 and and 
JH393405.1 for SCAND3 and KRBA2, 
respectively). These observations suggest 
that while the Sciuridae (represented by 
the squirrel) preserve intact copies of both 
genes, the Muroidea (rats, mouses, etc.) 
and apparently the Hystricomorpha (the 
guinea pig) lost SCAND3 and KRBA2 
probably after the Rodentia split into their 
diverse suborders. Interestingly, the draft 
genome sequence of Loxodonta africana 
(the elephant) contains a single copy 
of KRBA2 but at least three SCAND3 
copies distributed in three different 
genomic scaffolds (for more details see 
methods) thus suggesting lineage-specific 
triplication of this gene. The situation 
in metatherians (marsupials) is less clear 
because draft genome sequences are 
available for only two species; Monodelphis 
domestica (opossum) and Macropus 
eugenii (wallaby). The former apparently 
lacks both genes, while we detected a 
copy of SCAND3 (still incomplete, as 
the information is based on a truncated 
sequence containing the SCAN domain 
and the INT/TPase central core) in the 
wallaby. Therefore, it remains unclear 
whether both genes are truly missing in 
opossum and whether the SCAND3- 
like sequence from wallaby represents an 
ortholog of the gene seen in eutherians. 
Further genomic data for marsupials will 
help to determine with more precision 
the evolutionary origin of both genes in 
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Figure 1. GINGER2 origin of SCAND3 and KRBA2 based on their IN/TPase domain. (A) Inferred phylogenetic ML tree based on GINGER2, SCAN/KRAB, 
Fob1 and Tdd-like IN/TPases using the latter as an outgroup clade.To the left, the name or acronym of each analyzed sequence is detailed and accom- 
panied by host genome information. Boostrap values up to 60 are detailed in the figure. (B) Multiple alignment showing the strong similarity between 
GINGER2 and SCAN/KRAB IN/TPases. The typical INT core shared with other IN/TPases is delimited by arrows. 



mammals, for example whether SCAND3 
is older than KRBA2. Regarding the 
disclosed transposon counterpart of the 
SCAND3 SCAN/KRAB IN/TPase, 
database searches suggest that GINGER2 
transposons are common in prostostomes 
and cnidarians and that they also occur 
in some basal marine deuterostomes and 
a few protists. We did not detect any 
GINGER2 transposon sequence in any 
vertebrate taxa (except the SCAN/KRAB 
IN/TPase domain). In turn, Spin/Buster 
elements are found in a wide range of 
metazoans although CharlielO is by now a 
Spin/Buster element specific of mammals. 
In similar terms, the distribution of 
SCAN and KRAB domains, is known 
to be restricted to vertebrates. 19 Together, 
these observations suggest that the 
common SCAN/KRAB IN/TPase might 
derive from an ancient horizontal transfer 
of a GINGER2 transposon, most likely 



acquired from insect, to the common 
therian ancestor, while the SCAN and 
KRAB domains as well as the hAT Buster 
TPase of SCAND3 were contributed by the 
mammalian host genome. From that point 
on, we speculate about a plausible gene 
duplication of the ancestral GINGER2 
precursor (note that the two SCAND3 
and KRBA2 IN/TPase domains are 
siblings) triggered a subsequent process of 
exonization. Supporting our hypothesis, 
annotations of KRBA2 in distinct 
genomes usually reveal two exons. The 
first includes the KRAB domain and the 
second encodes for the whole IN/TPase 
domain. Similarly, SCAND3 is arranged 
in four exons where exon 1 and exon 2 
contain the SCAN domain, while exon 3 
contains the entire IN/TPase domain and 
exon 4 the full-length Spin/Buster TPase 
(for a representation, see Supplemental 
Material 3). Indeed, exonization 



of MGEs has been shown to be an 
interesting mechanism for the enrichment 
of several mammalian genomes that 
might be involved, or at least correlated, 
with diverse events of speciation. Along 
these lines, the processess of transposon 
exonization that apparently conducted to 
the emergence of SCAN/KRAB C-INTs 
constitute an interesting example of 
how nature has been capable to shape 
the complexity of mammals during 
evolution by co-opting and mixing full- 
length MGEs from distinct sources to 
acquire new functionalities. The question 
concerning SCAND3 and KRBA2 is, 
which functionalities? 

The fact that both SCAN and KRAB 
domains can be typically associated 
with transcriptional regulation of gene 
expression suggests that SCAND3 
and KRBA2 are transcription factors. 
However, SCAN and KRAB are only the 
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Figure 2. Taxonomic host distribution of SCAN/KRAB cellular-integrases and their related transposons integrated in a tree of life simplified representa- 
tion. Branches are not to scale. GINGER2 and related elements are summarized in blue, while those of Buster elements are in green and SCAND3 and 
KRBA2 are red. To implement information about Spin/Buster transposons based on the survey of distinct works 2 '- 35 43 or performed distinct BLASTp or 
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N-terminal domains of SCAND3 and 
KRBA2 which are mainly defined by a 
common GINGER2 core, and in the case 
of SCAND3, by an additional Spin/Buster 
core. The high degree of preservation 
of these transposase cores suggests that 
SCAN/KRAB C-INTs are indeed 
functional genes. Such a functionality 
does not appear to be however based on 
conventional "cut-and-paste" transposable 
activity, as in that case we would expect to 
find more than one SCAND3 and KRBA2 
copy per host genome. Therefore, the so 
called term of C-INTs coined by Gao and 
Voytas (ref. 6) is appropiate when talking 
about SCAND3 and KRBA2. Among the 
distinct cellular functions we might assign 
to these enzymes the most attractive one 
is perhaps a defensive role against the 
recombinant activities of other mobile 
elements. On one side, this hypothesis is 
supported by previous works revealing 
that DDE enzymes can play diverse roles 
in acquired and innate immunity (see for 
instance ref. 32). On the other hand, it has 
been shown that several KRAB carrying 
proteins play a role in transcriptional 
regulation of exogenous and endogenous 
retroviruses. 17,18 Moreover, a recent 
study 33 exposed the existence of positive 
correlation between the number of 
LTR retroelements and the number of 
tandems of zinc-finger genes (most of 
these are carriers of KRAB and/or SCAN 
domains) across vertebrate genomes. The 
idea of a defensive role against retroviral 
infections is indeed plausible and has 
also been proposed when speculating 
with the existence in vertebrates of other 
single-copy GINGERl-like genes such 
as GIN1, 34 GIN2, 26 ' 35 and cGINl. 36 
There is no evidence as to whether the 
two genes examined herein might have 
similar role in mammals. Along similar 
lines, domesticated transposases such as 
Metnase/SETMAR 37 and Fobl 38 have 
been shown to play distinct roles in DNA 
repair in the genomes of primates and 
yeasts, respectively. The former has no 
relationship with SCAN/KRAB C-INTs 
but Fobl not only share evolutionary 
history with SCAN/KRAB C-INTs and 
GINGER2 transposons but that it is 
phylogenetically close to them in the INT/ 
TPase tree (see Supplemental Material 3 
and ref. 6). Further experimental studies 



on SCAN/KRAB C-INTs and other 
INT/TPases such as GIN1, GIN-2, and 
CGINl will be important to unravel 
novel and yet unknown biological details 
concerning the co-evolution of the mobile 
DNA and the genomic complexity of their 
hosts. 
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